// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements.  See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License.  You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Import Model from Apache Spark

Starting with Ignite 2.8,  it's possible to import the following models of Apache Spark ML:

- Logistic regression (`org.apache.spark.ml.classification.LogisticRegressionModel`)
- Linear regression (`org.apache.spark.ml.classification.LogisticRegressionModel`)
- Decision tree (`org.apache.spark.ml.classification.DecisionTreeClassificationModel`)
- Support Vector Machine (`org.apache.spark.ml.classification.LinearSVCModel`)
- Random forest (`org.apache.spark.ml.classification.RandomForestClassificationModel`)
- K-Means (`org.apache.spark.ml.clustering.KMeansModel`)
- Decision tree regression (`org.apache.spark.ml.regression.DecisionTreeRegressionModel`)
- Random forest regression (`org.apache.spark.ml.regression.RandomForestRegressionModel`)
- Gradient boosted trees regression (`org.apache.spark.ml.regression.GBTRegressionModel`)
- Gradient boosted trees (`org.apache.spark.ml.classification.GBTClassificationModel`)

This feature works with models saved in _snappy.parquet_ files.

Supported and tested Spark version: 2.3.0
Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:


[source, scala]
----
val spark: SparkSession = TitanicUtils.getSparkSession

val passengers = TitanicUtils.readPassengersWithCasting(spark)
    .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

// Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
val assembler = new VectorAssembler()
    .setInputCols(Array("pclass", "sibsp", "parch", "survived"))
    .setOutputCol("features")

// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
val output = assembler.transform(
    passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
).select("features", "age")


val lr = new LinearRegression()
    .setMaxIter(100)
    .setRegParam(0.1)
    .setElasticNetParam(0.1)
    .setLabelCol("age")
    .setFeaturesCol("features")

// Fit the model
val model = lr.fit(output)
model.write.overwrite().save("/home/models/titanic/linreg")
----


To load in Ignite ML you should use SparkModelParser class via method parse() call


[source, java]
----
DecisionTreeModel mdl = (DecisionTreeModel)SparkModelParser.parse(
   SPARK_MDL_PATH,
   SupportedSparkModels.DECISION_TREE
);
----

You can see more examples of using this API in the examples module in the package: `org.apache.ignite.examples.ml.inference.spark.modelparser`

NOTE: It does not support loading from PipelineModel in Spark.
It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side.

